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This listing of claims replaces all prior versions, and 
listings of claims in the instant application: 

Listing of Claims; 

1. (Currently Amended) A method of executing a single 
instruction parallel multiply-add function on a processor, the 
method comprising: 

providing the processor with an opcode indicating a 

parallel multiply-add instruction having at least three 

operands ; 

providing the processor with a first, a second and a 
third value of a first operand, a second operand and a 
third operand, respectively in said at least three 
operands, wherein each of the values comprises two or more 
operand components; 

multiplying first operand components of the first and 
the second values to generate a first intermediate value; 

multiplying second operand components of the first 
and the second values to generate a second intermediate 
value ; 

adding a first operand component of the third value 
to the first intermediate value to generate a first result 
value ; 

adding a second operand component of the third value 
to the second intermediate value to generate a second 
result value; 

storing the first result value in a first portion of 
a result location; and 

storing the second result value in a second portion 
of the result location. 
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2. (Original) The method of claim 1, wherein the first, 
second and third values are stored in respective source 
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registers of the processor specified. by the parallel multiply- 
add instruction, and the first and the second result values are 
stored in a destination register of the processor specified by 
the parallel multiply-add instruction. 

3. (Cancelled) 

4. (Original) The method of claim 1, wherein the 
processor is pipelined and the single instruction is 

executed with a throughput of one instruction every 2 
cycles - 

5. to 17. (Cancelled) 

18. (Currently Amended) A processor comprising: 
a first and second multiplier paths; 
a first and second adder paths; and 

wherein the processor supports a parallel multiply- 
add instruction, the parallel multiply add instruction 
having at least three operands executable to cause the 
processor to, 

in parallel, route a first component of a first 
operand and a first component of a second operand to 
the first multiplier path and a second component of 
the first operand and a second component of the 
second operand to the second multiplier path, 

in parallel, route output of the first 
multiplier path and a first component of a third 
operand to the first adder path, and output of the 
second multiplier path and a second component of the 
third operand to the second adder path, and 

store output of the first adder path at a first 
location and output of the second adder path at a 
second location. 



Page 3 of 15 



Appl. No. 09/640,901 

Amdt. dated July 3, 2008 

Reply to Office Action of April 8, 2008 



19 . (Cancelled) 

20. (Currently Amended) The processor of claim [[19]] 18, 
wlierein the results of the parallel multiply-add instruction 
are saturated. 

21. (Currently Amended) The processor of claim [[19]] 18, 
wherein the processor provides multiple saturation modes. 

22. (Previously Presented) The processor of claim 18, 
wherein the processor further supports a conditional 

pick instruction, the conditional pick instruction 
executable to cause the processor to compare a first value 
to zero and to copy either a second value or a third value 
to a destination location depending on the comparison. 

23. (Previously Presented) The processor of claim 18, 
wherein the processor further supports a parallel 

averaging instruction, the parallel averaging instruction 
executable to cause the processor to average a first 
operand's first component and a second operand's first 
component, and, in parallel, to average the first 
operand's second component and the second operand's second 
component . 
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24. (Previously Presented) The processor of claim 18, 
wherein the processor further supports a parallel 
shift instruction, the parallel shift instruction 
executable to cause the processor to logically shift a 
first portion of a first value in accordance with a first 
portion of a second value, and, in parallel, shift a 
second portion of the first value in accordance with a 
second portion of the second value. 
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25. (Previously Presented) The processor of claim 18 
wherein the processor further supports a parallel power 
instruction, the parallel power instruction executable to cause 
the processor to, 

raise a first component of a first operand to a power 
indicated in a first component of a second operand and, in 
parallel, raise a second component of a the first operand 
to a power indicated in a second component of the second 
operand , 

26. (Previously Presented) The processor of claim 18 
wherein the processor further supports a parallel reciprocal 
square root instruction, the parallel reciprocal square root 
instruction executable to cause the processor to, 

determine a reciprocal square root of an operand's 
first component and, in parallel, determine a reciprocal 
square root of the operand's second component. 
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27, (Previously Presented) A computer program product 
encoded on one or more machine -readable media, the computer 
program product comprising: 

an instruction sequence, the instruction sequence 

including an instance of a parallel multiply add 

instruction; 

the instance of the parallel multiply add instruction 
having an at least four operand instruction format, 

wherein execution of the parallel multiply add 
instruction 

causes generation of a first product from a 
first operand's first component and a second 
operand's first component, in parallel with 
generation of a second product from the first 
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operand's second component and the second 
operand ' s second component , 

causes generation of a first sum from the 
first product and a third operand's first 
component, in parallel with generation of a 
second sum from the second product and the third 
operand's second component, and 

causes the first sum to be stored in 
accordance with a fourth operand's first 
component and the second sum to be stored in 
accordance with the fourth operand's second 
component . 

28. to 32. (Cancelled) 

33. (Currently Amended) A method of executing an 
instruction instance comprising: 

generating a first product and a second product in 
parallel, wherein the first product is from a first value 
in a first portion of a first operand of said instruction 
and a second value in a first portion of a second operand 
of said instruction and the second product is from a third 
value in a second portion of said first operand of said 
instruction and a fourth value in a second portion of the 
second operand of said instruction ; and 

generating a first sum and a second sum in parallel, 
wherein the first sum is from the first product and a 
fifth value in a first portion of a third operand of said 
instruction and the second sum is from the second product 
and a sixth value in a second portion of the third operand 
of said instruction. 
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34. (Cancelled) 



Page 6 of 15 



Appl. No. 09/640,901 

Amdt. dated July 3, 2008 

Reply to Office Action of April 8, 2008 

35. (Previously Presented) The method of claim 33 
further comprising storing, in parallel, the first sum in a 
first location and the second sum in a second location. 

36. (Previously Presented) The method of claim 35, 
wherein the first location is a first portion of a destination 
register and the second location is a second portion of the 
destination register. 

37. (Previously Presented) The method of claim 33 wherein 
the instruction instance is executed by a pipelined processor 
that performs operations for the instruction instance in 2 
cycles . 

38. (Previously Presented) The method of claim 33 
embodied as a computer program product encoded in one or more 
machine-readable media, 

39. (Previously Presented) The processor of claim 18, 
wherein the first store location is a first part of a register 
and the second store location is a second part of the register. 

40. (Previously Presented) The processor of claim 18, 
wherein the first store location is a first register and the 
second store location is a second register. 

43. (Previously Presented) The processor, of claim 23 
further comprising: 

a plurality of adder paths; and 
a plurality of shifter paths; 

wherein the parallel averaging instruction, when 
executed, causes the processor to, 

route the first operand's first component 
and the second operand's second component to a 
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first of the plurality of adder paths, and, in 
parallel, route the first operand's second 
component and the second operand's second 
component to a second of the plurality of adder 
paths; 

after propagation delay, route output of 
the first adder path and a one value to a third 
of the plurality of adder paths, and, in 
parallel, route output of the second adder path 
and a one value to fourth of the plurality of 
adder paths; 

after propagation delay, route output of the third 
adder path and a first control value a first of the 
plurality of shifter paths, and, in parallel, route output 
of the fourth adder path and a second control value to a 
second of the plurality of shifter paths. 
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